library("survival")
library("survminer")
## Loading required package: ggplot2
## Loading required package: ggpubr
## 
## Attaching package: 'survminer'
## The following object is masked from 'package:survival':
## 
##     myeloma
library("viridis")
## Loading required package: viridisLite
library("plotly")
## 
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
## 
##     last_plot
## The following object is masked from 'package:stats':
## 
##     filter
## The following object is masked from 'package:graphics':
## 
##     layout
library("htmlwidgets")
library("GGally")
## Registered S3 method overwritten by 'GGally':
##   method from   
##   +.gg   ggplot2
library("tidyverse")
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
## ✓ tibble  3.1.6     ✓ dplyr   1.0.8
## ✓ tidyr   1.2.0     ✓ stringr 1.4.0
## ✓ readr   2.1.2     ✓ forcats 0.5.1
## ✓ purrr   0.3.4
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks plotly::filter(), stats::filter()
## x dplyr::lag()    masks stats::lag()
# Creating basic model:
surv1 <- survfit(Surv(time,status) ~ sex, data = lung)

# Getting data for plotly graph
p <- ggsurv(surv1, CI = T)
df <- data.frame(prob = p$data$surv, sex = p$data$group, 
                 upper = p$data$up, lower = p$data$low, time = p$data$time)
df$sex <- ifelse(df$sex == 1, "Male", "Female")
df$sex <- as.factor(df$sex)
df_wider <- df %>%
  pivot_wider(names_from = sex, values_from = prob)

df_wider$MaleUpper <- ifelse(!is.na(df_wider$Male), df_wider$upper, NA)
df_wider$MaleLower <- ifelse(!is.na(df_wider$Male), df_wider$lower, NA)
df_wider$FemaleUpper <- ifelse(!is.na(df_wider$Female), df_wider$upper, NA)
df_wider$FemaleLower <- ifelse(!is.na(df_wider$Female), df_wider$lower, NA)


# Plotly interactive graph:
p_i <- plot_ly() %>% 
  add_lines(x = df_wider$time, y = ~df_wider$MaleUpper, name = "Upper Bound", 
            line = list(width=.5, dash="dot", color = '#481567FF'), showlegend = FALSE
            , hovertemplate = paste('%{y}')) %>%
  add_lines(x = df_wider$time, y = ~df_wider$Male, name = "Male", line = list(color = '#481567FF') 
            , hovertemplate = paste('%{y}')) %>%
  add_lines(x = df_wider$time, y = ~df_wider$MaleLower, name = "Lower Bound", 
            line = list(width=.5, dash="dot", color = '#481567FF'), showlegend = FALSE
            , hovertemplate = paste('%{y}')) %>%
  
  add_lines(x = df_wider$time, y = ~df_wider$FemaleUpper, name = "Upper Bound", 
            line = list(width=.5, dash="dot", color = '#DCE319FF'), showlegend = FALSE
            , hovertemplate = paste('%{y}')) %>%
  add_lines(x = df_wider$time, y = ~df_wider$Female, name = "Female", line = list(color = '#DCE319FF') 
            , hovertemplate = paste('%{y}')) %>%
  add_lines(x = df_wider$time, y = ~df_wider$FemaleLower, name = "Lower Bound", 
            line = list(width=.5, dash="dot", color = '#DCE319FF'), showlegend = FALSE
            , hovertemplate = paste('%{y}')) %>%

  rangeslider() %>%
  
  layout(hovermode = "x unified", 
         title = "Survival Curves of Men and Women With Lung Cancer",
         xaxis = list(title = "Time (in days)"),
         yaxis = list(title = "Survival Probability"))

p_i

About the Plot

Data Source

This data source can be found in the “survival” r package and can be pulled from any machine just by calling this package in r. The description of the data and for each of the variables can be accessed via running “?lung” code. The description provided states: “Survival in patients with advanced lung cancer from the North Central Cancer Treatment Group. Performance scores rate how well the patient can perform usual daily activities.”

What I want to convey in the figure

I want to show the difference between survival rates between males and females with 95% confidence bands. I want the user to be able to easily see from a high level the difference between the two groups and further be able to compare between probabilities per a specified time range. Adding a time range and the ability to interact with the graph, more conveniently allows the user to zoom in to see where there is overlap and by what degree. The tooltip contains the upper and lower bounds of each of the groups to provide the numeric values.

What functionality and formatting you put into the figure and why

I first built a basic survival model to predict survival times by sex. I put this model in a ggsurv plot to grab the probabilities for each sex and each of the upper and lower confidence band values. Converting the ggsurv plot to a plotly graph did not allow the customization I needed in order to make the graph more readable, color coordinated, and tooltip friendly. I decided to directly use plotly to incorporate these features so I manipulated the data frame to be in wide format in order for plotly to be able to visualize everything I wanted. I created confidence bands for each of the sexes and made them dashed lines with smaller line widths. This way it easier to distinguish between the predicted lines and their respective confidence bands without getting too overwhelming. I also needed to have each line coordinated to a specific color in order to maintain consistency. Adding each of these lines cluttered the legend so I added code to hide the confidence bands and only keep the predicted lines. After adding in the hovermode function with “x” as the input, it was really hard to understand the graph. To fix this I changed the input to “x unified” so that there would only be one tooltip and then reordered the upper and lower bounds about the predicted lines. Lastly I added a title and x and y axes labels to make the graph interpretable.